5,093 research outputs found

    Acquiring Knowledge from Pre-trained Model to Neural Machine Translation

    Full text link
    Pre-training and fine-tuning have achieved great success in the natural language process field. The standard paradigm of exploiting them includes two steps: first, pre-training a model, e.g. BERT, with a large scale unlabeled monolingual data. Then, fine-tuning the pre-trained model with labeled data from downstream tasks. However, in neural machine translation (NMT), we address the problem that the training objective of the bilingual task is far different from the monolingual pre-trained model. This gap leads that only using fine-tuning in NMT can not fully utilize prior language knowledge. In this paper, we propose an APT framework for acquiring knowledge from the pre-trained model to NMT. The proposed approach includes two modules: 1). a dynamic fusion mechanism to fuse task-specific features adapted from general knowledge into NMT network, 2). a knowledge distillation paradigm to learn language knowledge continuously during the NMT training process. The proposed approach could integrate suitable knowledge from pre-trained models to improve the NMT. Experimental results on WMT English to German, German to English and Chinese to English machine translation tasks show that our model outperforms strong baselines and the fine-tuning counterparts

    Forecasting the All-Weather Short-Term Metro Passenger Flow Based on Seasonal and Nonlinear LSSVM

    Get PDF
    Accurate metro ridership prediction can guide passengers in efficiently selecting their departure time and simultaneously help traffic operators develop a passenger organization strategy. However, short-term passenger flow prediction needs to consider many factors, and the results of the existing models for short-term subway passenger flow forecasting are often unsatisfactory. Along this line, we propose a parallel architecture, called the seasonal and nonlinear least squares support vector machine (SN-LSSVM), to extract the periodicity and nonlinearity characteristics of passenger flow. Various forecasting models, including auto-regressive integrated moving average, long short-term memory network, and support vector machine, are employed for evaluating the performance of the proposed architecture. Moreover, we first applied the method to the Tiyu Xilu station which is the most crowded station in the Guangzhou metro. The results indicate that the proposed model can effectively make all-weather and year-round passenger flow predictions, thus contributing to the management of the station

    A SEEMINGLY UNRELETED REGRESSION ANALYSIS ON THE TRADING BEHAVIOR OF MUTUAL FUND INVESTORS

    Get PDF
    This paper provides a comprehensive investigation on the causality relationship between fund performance and trading flows. We analyze if investors behave asymmetrically in fund purchasing and selling by seemingly unrelated regression which comprises several individual relationships that are linked by the fact that their disturbances or the error terms are correlated. The empirical result shows a significantly negative relationship between fund performance and purchase flows for domestic funds. The magnitude of domestic funds redemption negatively affects current return, but not for international funds. As previousfund return positively affects current net flows,the further lagged performances have no significant impact on the trading flows, revealing that fund investors are sensitive only to short-term past performance. Most importantly, while negative fund performance leads to the increases in redemption, positive performance contrarily leads to the decreases in purchase. The evidences strongly indicate an asymmetry behavior of fund investors in the return-purchase causality relations

    A Restricted Black-box Adversarial Framework Towards Attacking Graph Embedding Models

    Full text link
    With the great success of graph embedding model on both academic and industry area, the robustness of graph embedding against adversarial attack inevitably becomes a central problem in graph learning domain. Regardless of the fruitful progress, most of the current works perform the attack in a white-box fashion: they need to access the model predictions and labels to construct their adversarial loss. However, the inaccessibility of model predictions in real systems makes the white-box attack impractical to real graph learning system. This paper promotes current frameworks in a more general and flexible sense -- we demand to attack various kinds of graph embedding model with black-box driven. To this end, we begin by investigating the theoretical connections between graph signal processing and graph embedding models in a principled way and formulate the graph embedding model as a general graph signal process with corresponding graph filter. As such, a generalized adversarial attacker: GF-Attack is constructed by the graph filter and feature matrix. Instead of accessing any knowledge of the target classifiers used in graph embedding, GF-Attack performs the attack only on the graph filter in a black-box attack fashion. To validate the generalization of GF-Attack, we construct the attacker on four popular graph embedding models. Extensive experimental results validate the effectiveness of our attacker on several benchmark datasets. Particularly by using our attack, even small graph perturbations like one-edge flip is able to consistently make a strong attack in performance to different graph embedding models.Comment: Accepted by the AAAI 202
    • ā€¦
    corecore